Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No token padding for train_network #1677

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

gesen2egee
Copy link
Contributor

@gesen2egee gesen2egee commented Oct 7, 2024

When training with Illustrious-xl 0.1, it appears that token padding was disabled during the training process.
https://arxiv.org/pdf/2409.19946
Illustrious: an Open Advanced Illustration Model A.6

Testing with token padding enabled seems to result in worse outcomes.
To address this issue, I've added the --no_token_padding option to the training network,
allowing users to disable token padding during training for potentially better results.

prompt 1girl, solo
no neg
image

@kohya-ss
Copy link
Owner

kohya-ss commented Oct 7, 2024

A.6 states the following:

During training, text encoder outputs must be padded to be packed in batch.

According to this, token padding is performed during training. I believe A.6 is a report that it is better not to perform padding on CFG.

Also disable_token_padding is going to be dropped in the sd3 branch because it requires complex processing.

@feffy380
Copy link
Contributor

feffy380 commented Nov 2, 2024

As far as I can tell, the Illustrious report is saying the padding tokens unintentionally learned information about the images, which causes adverse effects when they're included during inference especially since most UIs will optimize CFG by inserting padding so cond and uncond can be batched together.

As mentioned above, everything must be padded to the same length for batching to work, so as a fix they suggest masking out the padding tokens during training.
@kohya-ss when masking is implemented for SD3, could this also be extended to SDXL since they use the same CLIP encoders?

@kohya-ss
Copy link
Owner

kohya-ss commented Nov 2, 2024

@kohya-ss when masking is implemented for SD3, could this also be extended to SDXL since they use the same CLIP encoders?

I hadn't thought of that, but it's certainly possible. However, unlike SD3, there is an option to extend the token length to 150/225 in a bit hacky way. So it seems that the implementation cannot be copied as is...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants